Text Classification: Forming Candidate Key-Phrases from Existing Shorter Ones
نویسندگان
چکیده
The hard problem of the Text Classification usually has various aspects and potential solutions. In this paper, two main research directions for narrative documents’ classification are considered. The first one is based on data mining and rule induction techniques, while the second combines the traditional Text Retrieval techniques (use of the vector space model,
منابع مشابه
Naive Rule Induction for Text Classification based on Key-phrases
In this paper, we focus on the induction of naive rules for classifying text documents. An algorithm is briefly described for the creation of key-phrases from a given set of documents and these key-phrases are organized and used as features for the automatic classification of new documents. An Authority list of key-phrases is specified by the algorithm containing key-phrases that are frequent w...
متن کاملKey-phrase Extraction for Classification
In this paper we consider the problem of extracting key-phrases from a bilingual texts collection and using them for text classification. A key-phrase could be defined as a sequence of words of a given size in a given partial order that occur within a sentence. We describe an algorithm for the discovery of key-phrases. Then, a framework of handling multilingual texts / documents is described wh...
متن کاملروش جدید متنکاوی برای استخراج اطلاعات زمینه کاربر بهمنظور بهبود رتبهبندی نتایج موتور جستجو
Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...
متن کاملA New Method of Region Embedding for Text Classification
To represent a text as a bag of properly identified “phrases” and use the representation for processing the text is proved to be useful. The key question here is how to identify the phrases and represent them. The traditional method of utilizing n-grams can be regarded as an approximation of the approach. Such a method can suffer from data sparsity, however, particularly when the length of n-gr...
متن کاملA New Method of Region Embedding for Text Classification
To represent a text as a bag of properly identified “phrases” and use the representation for processing the text is proved to be useful. The key question here is how to identify the phrases and represent them. The traditional method of utilizing n-grams can be regarded as an approximation of the approach. Such a method can suffer from data sparsity, however, particularly when the length of n-gr...
متن کامل